Data Center Assessment Can Prevent Downtime By Identifying Weak Spots In Design, Equipment, And Operations
If you manage a data center, your performance is likely measured by your department's ability to keep power and cooling systems functioning continuously. Though a facilities-related downtime event is inevitable, facility managers carry increasing pressure from senior executives to lengthen uninterrupted service duration. It is safe to assume this pressure will continue to increase as computer applications grow more critical to organizations and their customers. A assessment can go a long way toward achieving maximum uptime and preventing downtime by identifying weak spots in design, equipment, and operations.
It is important to conduct a facility "check up" every three to five years, because dynamics such as load growth, changes in type of load supported, effectiveness of maintenance practices, and age of equipment directly affect your ability to operate facility systems without incident.
If performed effectively, an assessment will provide peace of mind by validating functional infrastructure system capacities and room for growth. The assessment will identify single points of failure before they cause surprise interruptions. It will also determine which systems do not match design objectives — concurrently maintainable, fault tolerant, etc. — and confirm that systems are being utilized as expected. It will also look at human factors issues, identifying facility operations' and IT operations' shortcomings versus uptime objectives; remember that downtime is caused by human error 60 to 80 percent of the time. Of course, the ultimate goal of an assessment is to permit each deficiency to be addressed before downtime occurs.
Why have less than 10 percent of critical facility owners performed assessments or scheduled an assessment proactively? Human nature is most likely to blame. When the computer operation has not experienced an interruption to processing in many months, it is common for management to be lulled into a false sense of security. In fact, the longer the duration of uptime, the more it is expected to continue. Once an event does occur, it is remarkable how quickly funding becomes available to determine the root cause and to answer the question, "What else could cause an interruption to processing?"
The best strategy is to proactively conduct facility assessments at a regular frequency, so as to identify potential risks before they cause an interruption. Cost to assess will pale in comparison to the cost of downtime events you will likely avoid. Today, many organizations value downtime in excess of $1 million per hour ($4 million or more per event, when including the average recovery period). Assessment costs generally range from $15,000 to $75,000.
Even if you are unable to justify the proactive approach, it is still worthwhile to have an assessment plan developed and ready to implement quickly. You will be pressured to complete an assessment once an interruption does occur. If more than a month or two passes, the memory of pain may subside enough that assessment funding is withdrawn.
As a first step, establish the objectives by which your facility operation will be measured:
- Define desired reliability and maintainability. Is the goal continuous operation, with no planned or unplanned interruptions desired? Are some unplanned interruptions permissible? How many? Is some scheduled downtime permissible for maintenance? How much?
- Quantify the number of years of reliable service expected from your facility.
- Establish an assessment checklist to ensure your assessment results provide a comprehensive picture.
- Include both systems and processes.
The assessment team should use these objectives to qualify their findings and recommendations.
Related Topics: